---
title: "Parameter Recovery Analysis"
subtitle: "Foundational Report 4"
description: |
Validation that model parameters can be accurately recovered from
simulated data, demonstrating model identifiability.
categories: [foundations, validation, m_0]
execute:
cache: true
---
```{python}
#| label: setup
#| include: false
import sys
import os
# Add parent directories to path
sys.path.insert(0, os.path.join(os.getcwd(), '..'))
project_root = os.path.dirname(os.path.dirname(os.getcwd()))
sys.path.insert(0, project_root)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import json
import warnings
warnings.filterwarnings('ignore')
# Set random seed for reproducibility
np.random.seed(42)
```
## Introduction
Having examined the prior predictive distribution ([Report 3](03_prior_analysis.qmd)), we now ask: **can we recover the true parameters when we simulate data from our model with known parameters?** This is the fundamental question of parameter recovery analysis.
Parameter recovery is a critical validation step for any Bayesian model:
1. **Identifiability**: Can the model distinguish between different parameter configurations?
2. **Estimation accuracy**: Are posterior means close to true values?
3. **Uncertainty calibration**: Do 90% credible intervals contain the true value ~90% of the time?
4. **Precision**: How narrow are the credible intervals?
::: {.callout-warning}
## Preview: A Key Finding
This report reveals an asymmetry in parameter recovery: while α (sensitivity) is well-identified, the pair (β, δ) that determines the expected utility function shows weaker identification—posteriors concentrate more slowly relative to α. As shown formally in [Report 5](05_adding_risky_choices.qmd), this reflects a **structural identification challenge** where β and δ interact multiplicatively in the expected utility, causing uncertainty in one to propagate to the other. Importantly, this does not prevent good recovery of α, which remains our primary parameter of interest.
:::
::: {.callout-note}
## The Recovery Paradigm
Parameter recovery follows a simple logic:
1. **Simulate**: Generate data from `m_0_sim.stan` with known ("true") parameter values
2. **Estimate**: Fit `m_0.stan` to the simulated data
3. **Compare**: Assess how well posterior estimates match the true values
4. **Repeat**: Do this many times to characterize recovery performance
If the model is well-specified and identifiable, the posterior should concentrate around the true values.
:::
## Study Design
We use the same study design as in [Report 3](03_prior_analysis.qmd) to maintain consistency across our foundational analyses:
```{python}
#| label: study-design-config
#| echo: true
# Study design configuration (same as Report 3)
config = {
"M": 25, # Number of decision problems
"K": 3, # Number of consequences
"D": 5, # Feature dimensions
"R": 15, # Number of distinct alternatives
"min_alts_per_problem": 2, # Minimum alternatives per problem
"max_alts_per_problem": 5, # Maximum alternatives per problem
"feature_dist": "normal", # Feature distribution
"feature_params": {"loc": 0, "scale": 1}
}
print(f"Study Design Configuration:")
print(f" Decision problems (M): {config['M']}")
print(f" Consequences (K): {config['K']}")
print(f" Feature dimensions (D): {config['D']}")
print(f" Distinct alternatives (R): {config['R']}")
print(f" Alternatives per problem: {config['min_alts_per_problem']}-{config['max_alts_per_problem']}")
```
```{python}
#| label: generate-study-design
#| output: false
from utils.study_design import StudyDesign
# Create and generate the study design
study = StudyDesign(
M=config["M"],
K=config["K"],
D=config["D"],
R=config["R"],
min_alts_per_problem=config["min_alts_per_problem"],
max_alts_per_problem=config["max_alts_per_problem"],
feature_dist=config["feature_dist"],
feature_params=config["feature_params"],
design_name="parameter_recovery"
)
study.generate()
```
```{python}
#| label: design-summary
#| echo: false
# Extract design properties
w = np.array(study.w)
I = np.array(study.I)
N = I.sum(axis=1)
print(f"\nGenerated Study Design:")
print(f" Total alternatives across problems: {I.sum()}")
print(f" Alternatives per problem: min={N.min()}, max={N.max()}, mean={N.mean():.1f}")
```
## Parameter Recovery Analysis
We use the `ParameterRecovery` class to systematically evaluate how well parameters can be recovered. This class:
1. Uses `m_0_sim.stan` to generate data with known parameters drawn from the prior
2. Fits `m_0.stan` to each simulated dataset
3. Compares posterior estimates to true values across many iterations
```{python}
#| label: run-recovery
#| output: false
from analysis.parameter_recovery import ParameterRecovery
import tempfile
# Create output directory for this analysis
output_dir = tempfile.mkdtemp(prefix="param_recovery_")
# Initialize parameter recovery analysis
recovery = ParameterRecovery(
inference_model_path=None, # Uses default m_0.stan
sim_model_path=None, # Uses default m_0_sim.stan
study_design=study,
output_dir=output_dir,
n_mcmc_samples=1000, # Samples per chain
n_mcmc_chains=4, # Number of chains
n_iterations=50 # Number of simulation-recovery iterations
)
# Run the analysis
true_params_list, posterior_summaries = recovery.run()
```
```{python}
#| label: recovery-summary
#| echo: false
n_successful = len(true_params_list)
print(f"Parameter Recovery Analysis Complete:")
print(f" Successful iterations: {n_successful}")
print(f" Parameters recovered per iteration:")
print(f" - α (sensitivity): 1 parameter")
print(f" - β (feature weights): {config['K']} × {config['D']} = {config['K'] * config['D']} parameters")
print(f" - δ (utility increments): {config['K'] - 1} parameters")
```
## Recovery Metrics
We evaluate parameter recovery using four key metrics:
| Metric | Definition | Target |
|--------|------------|--------|
| **Bias** | Mean(estimate - true) | ≈ 0 |
| **RMSE** | √Mean((estimate - true)²) | Small |
| **Coverage** | P(true ∈ 90% CI) | ≈ 0.90 |
| **CI Width** | Mean(upper - lower) | Small but calibrated |
### Sensitivity Parameter ($\alpha$)
```{python}
#| label: fig-alpha-recovery
#| fig-cap: "Recovery of the sensitivity parameter α. Left: True vs. estimated values with identity line. Right: 90% credible intervals for each iteration, colored by whether they contain the true value."
# Extract alpha values
alpha_true = np.array([p['alpha'] for p in true_params_list])
alpha_mean = np.array([s.loc['alpha', 'Mean'] for s in posterior_summaries])
alpha_lower = np.array([s.loc['alpha', '5%'] for s in posterior_summaries])
alpha_upper = np.array([s.loc['alpha', '95%'] for s in posterior_summaries])
# Calculate metrics
alpha_bias = np.mean(alpha_mean - alpha_true)
alpha_rmse = np.sqrt(np.mean((alpha_mean - alpha_true)**2))
alpha_coverage = np.mean((alpha_true >= alpha_lower) & (alpha_true <= alpha_upper))
alpha_ci_width = np.mean(alpha_upper - alpha_lower)
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# True vs Estimated
ax = axes[0]
ax.scatter(alpha_true, alpha_mean, alpha=0.7, s=60, c='steelblue', edgecolor='white')
lims = [min(alpha_true.min(), alpha_mean.min()) * 0.9,
max(alpha_true.max(), alpha_mean.max()) * 1.1]
ax.plot(lims, lims, 'r--', linewidth=2, label='Identity line')
ax.set_xlim(lims)
ax.set_ylim(lims)
ax.set_xlabel('True α', fontsize=12)
ax.set_ylabel('Estimated α (posterior mean)', fontsize=12)
ax.set_title(f'α Recovery: Bias={alpha_bias:.3f}, RMSE={alpha_rmse:.3f}', fontsize=12)
ax.legend()
ax.set_aspect('equal')
# Coverage plot
ax = axes[1]
iterations = np.arange(len(alpha_true))
for i in range(len(alpha_true)):
covered = (alpha_true[i] >= alpha_lower[i]) & (alpha_true[i] <= alpha_upper[i])
color = 'forestgreen' if covered else 'crimson'
ax.plot([i, i], [alpha_lower[i], alpha_upper[i]], color=color, linewidth=2, alpha=0.7)
ax.scatter(i, alpha_mean[i], color=color, s=40, zorder=3)
ax.scatter(iterations, alpha_true, color='black', s=60, marker='x',
label='True value', zorder=4, linewidth=2)
ax.set_xlabel('Iteration', fontsize=12)
ax.set_ylabel('α', fontsize=12)
ax.set_title(f'α: 90% Credible Intervals (Coverage = {alpha_coverage:.0%})', fontsize=12)
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
print(f"\nα Recovery Statistics:")
print(f" Bias: {alpha_bias:.4f}")
print(f" RMSE: {alpha_rmse:.4f}")
print(f" 90% CI Coverage: {alpha_coverage:.1%}")
print(f" Mean CI Width: {alpha_ci_width:.3f}")
```
### Feature-to-Probability Weights ($\boldsymbol{\beta}$)
The β matrix has K × D = 15 parameters (3 consequences × 5 features). We examine recovery across all of them:
```{python}
#| label: compute-beta-recovery
#| output: false
# Compute recovery metrics for all beta parameters
K, D = config['K'], config['D']
beta_recovery = []
for k in range(K):
for d in range(D):
param_name = f"beta[{k+1},{d+1}]"
beta_true = np.array([p['beta'][k][d] for p in true_params_list])
beta_mean = np.array([s.loc[param_name, 'Mean'] for s in posterior_summaries])
beta_lower = np.array([s.loc[param_name, '5%'] for s in posterior_summaries])
beta_upper = np.array([s.loc[param_name, '95%'] for s in posterior_summaries])
bias = np.mean(beta_mean - beta_true)
rmse = np.sqrt(np.mean((beta_mean - beta_true)**2))
coverage = np.mean((beta_true >= beta_lower) & (beta_true <= beta_upper))
ci_width = np.mean(beta_upper - beta_lower)
beta_recovery.append({
'parameter': param_name,
'k': k + 1,
'd': d + 1,
'bias': bias,
'rmse': rmse,
'coverage': coverage,
'ci_width': ci_width,
'true': beta_true,
'mean': beta_mean,
'lower': beta_lower,
'upper': beta_upper
})
beta_df = pd.DataFrame(beta_recovery)
```
```{python}
#| label: fig-beta-recovery-summary
#| fig-cap: "Summary of β parameter recovery. Left: RMSE for each β coefficient. Right: Coverage rates (target = 90%, shown as dashed line)."
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# RMSE heatmap
ax = axes[0]
rmse_matrix = beta_df.pivot(index='k', columns='d', values='rmse')
im = ax.imshow(rmse_matrix.values, cmap='Blues', aspect='auto')
ax.set_xlabel('Feature Dimension (d)', fontsize=12)
ax.set_ylabel('Consequence (k)', fontsize=12)
ax.set_title('β RMSE by Position', fontsize=12)
ax.set_xticks(range(D))
ax.set_xticklabels([str(d+1) for d in range(D)])
ax.set_yticks(range(K))
ax.set_yticklabels([str(k+1) for k in range(K)])
for i in range(K):
for j in range(D):
ax.text(j, i, f'{rmse_matrix.values[i, j]:.2f}', ha='center', va='center', fontsize=10)
plt.colorbar(im, ax=ax, label='RMSE')
# Coverage bar plot
ax = axes[1]
coverage_values = beta_df['coverage'].values
x = np.arange(len(coverage_values))
colors = ['forestgreen' if c >= 0.85 else 'orange' if c >= 0.75 else 'crimson'
for c in coverage_values]
ax.bar(x, coverage_values, color=colors, alpha=0.7, edgecolor='white')
ax.axhline(y=0.90, color='red', linestyle='--', linewidth=2, label='Target (90%)')
ax.set_xlabel('β Parameter Index', fontsize=12)
ax.set_ylabel('Coverage', fontsize=12)
ax.set_title('β 90% CI Coverage by Parameter', fontsize=12)
ax.set_ylim(0, 1.05)
ax.legend()
plt.tight_layout()
plt.show()
print(f"\nβ Recovery Summary:")
print(f" Mean Bias: {beta_df['bias'].mean():.4f}")
print(f" Mean RMSE: {beta_df['rmse'].mean():.4f}")
print(f" Mean Coverage: {beta_df['coverage'].mean():.1%}")
print(f" Mean CI Width: {beta_df['ci_width'].mean():.3f}")
```
```{python}
#| label: fig-beta-true-vs-estimated
#| fig-cap: "True vs. estimated values for all β parameters pooled together. The identity line shows perfect recovery."
fig, ax = plt.subplots(figsize=(8, 8))
# Pool all beta values
all_beta_true = np.concatenate([r['true'] for r in beta_recovery])
all_beta_mean = np.concatenate([r['mean'] for r in beta_recovery])
ax.scatter(all_beta_true, all_beta_mean, alpha=0.4, s=30, c='purple', edgecolor='none')
lims = [min(all_beta_true.min(), all_beta_mean.min()) * 1.1,
max(all_beta_true.max(), all_beta_mean.max()) * 1.1]
ax.plot(lims, lims, 'r--', linewidth=2, label='Identity line')
ax.set_xlim(lims)
ax.set_ylim(lims)
ax.set_xlabel('True β', fontsize=12)
ax.set_ylabel('Estimated β (posterior mean)', fontsize=12)
ax.set_title(f'β Recovery (all {K*D} parameters pooled)', fontsize=12)
ax.legend()
ax.set_aspect('equal')
ax.grid(True, alpha=0.3)
# Add correlation
corr = np.corrcoef(all_beta_true, all_beta_mean)[0, 1]
ax.text(0.05, 0.95, f'r = {corr:.3f}', transform=ax.transAxes, fontsize=12,
verticalalignment='top', bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))
plt.tight_layout()
plt.show()
```
### Utility Increments ($\boldsymbol{\delta}$)
```{python}
#| label: fig-delta-recovery
#| fig-cap: "Recovery of utility increment parameters δ. Each panel shows true vs. estimated values and coverage for one δ component."
K_minus_1 = config['K'] - 1
fig, axes = plt.subplots(2, K_minus_1, figsize=(6 * K_minus_1, 10))
if K_minus_1 == 1:
axes = axes.reshape(2, 1)
delta_stats = []
for k in range(K_minus_1):
param_name = f"delta[{k+1}]"
delta_true = np.array([p['delta'][k] for p in true_params_list])
delta_mean = np.array([s.loc[param_name, 'Mean'] for s in posterior_summaries])
delta_lower = np.array([s.loc[param_name, '5%'] for s in posterior_summaries])
delta_upper = np.array([s.loc[param_name, '95%'] for s in posterior_summaries])
# Metrics
bias = np.mean(delta_mean - delta_true)
rmse = np.sqrt(np.mean((delta_mean - delta_true)**2))
coverage = np.mean((delta_true >= delta_lower) & (delta_true <= delta_upper))
ci_width = np.mean(delta_upper - delta_lower)
delta_stats.append({
'parameter': f'δ_{k+1}',
'bias': bias,
'rmse': rmse,
'coverage': coverage,
'ci_width': ci_width
})
# True vs Estimated
ax = axes[0, k]
ax.scatter(delta_true, delta_mean, alpha=0.7, s=60, c='forestgreen', edgecolor='white')
ax.plot([0, 1], [0, 1], 'r--', linewidth=2)
ax.set_xlim(-0.05, 1.05)
ax.set_ylim(-0.05, 1.05)
ax.set_xlabel(f'True δ_{k+1}', fontsize=11)
ax.set_ylabel(f'Estimated δ_{k+1}', fontsize=11)
ax.set_title(f'δ_{k+1}: Bias={bias:.3f}, RMSE={rmse:.3f}', fontsize=11)
ax.set_aspect('equal')
# Coverage
ax = axes[1, k]
iterations = np.arange(len(delta_true))
for i in range(len(delta_true)):
covered = (delta_true[i] >= delta_lower[i]) & (delta_true[i] <= delta_upper[i])
color = 'forestgreen' if covered else 'crimson'
ax.plot([i, i], [delta_lower[i], delta_upper[i]], color=color, linewidth=2, alpha=0.7)
ax.scatter(i, delta_mean[i], color=color, s=40, zorder=3)
ax.scatter(iterations, delta_true, color='black', s=60, marker='x',
label='True value', zorder=4, linewidth=2)
ax.set_xlabel('Iteration', fontsize=11)
ax.set_ylabel(f'δ_{k+1}', fontsize=11)
ax.set_title(f'δ_{k+1}: Coverage = {coverage:.0%}', fontsize=11)
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
delta_df = pd.DataFrame(delta_stats)
print(f"\nδ Recovery Summary:")
for _, row in delta_df.iterrows():
print(f" {row['parameter']}: Bias={row['bias']:.4f}, RMSE={row['rmse']:.4f}, Coverage={row['coverage']:.0%}")
```
## Summary Table
```{python}
#| label: tbl-recovery-summary
#| tbl-cap: "Parameter recovery statistics for all parameters in model m_0."
# Create comprehensive summary table
summary_rows = []
# Alpha
summary_rows.append({
'Parameter': 'α',
'Bias': alpha_bias,
'RMSE': alpha_rmse,
'Coverage': alpha_coverage,
'CI Width': alpha_ci_width,
'Notes': 'Sensitivity'
})
# Beta (aggregated)
summary_rows.append({
'Parameter': 'β (all)',
'Bias': beta_df['bias'].mean(),
'RMSE': beta_df['rmse'].mean(),
'Coverage': beta_df['coverage'].mean(),
'CI Width': beta_df['ci_width'].mean(),
'Notes': f'{K}×{D} parameters'
})
# Delta (individual)
for _, row in delta_df.iterrows():
summary_rows.append({
'Parameter': row['parameter'],
'Bias': row['bias'],
'RMSE': row['rmse'],
'Coverage': row['coverage'],
'CI Width': row['ci_width'],
'Notes': 'Utility increment'
})
summary_df = pd.DataFrame(summary_rows)
# Format for display
display_df = summary_df.copy()
display_df['Bias'] = display_df['Bias'].apply(lambda x: f'{x:.4f}')
display_df['RMSE'] = display_df['RMSE'].apply(lambda x: f'{x:.4f}')
display_df['Coverage'] = display_df['Coverage'].apply(lambda x: f'{x:.0%}')
display_df['CI Width'] = display_df['CI Width'].apply(lambda x: f'{x:.3f}')
print(display_df.to_string(index=False))
```
## Discussion
### Differential Recovery Across Parameters {#sec-utility-identification}
::: {.callout-note}
## Key Finding: Differential Parameter Recovery
The results above reveal an asymmetry in parameter recovery:
- **α (sensitivity)** is recovered well, with posterior means clustering tightly around true values
- **β (feature weights)** and **δ (utility increments)** show significantly weaker recovery, with wider credible intervals and lower precision compared to α
This pattern reflects the identification structure analyzed formally in [Report 5](05_adding_risky_choices.qmd): while α is globally identified, β and δ are only **weakly identified** through their composite mapping to expected utilities.
**Importantly for applications focused on estimating sensitivity (α):** The weaker identification of β and δ does not substantially impair α recovery. As the figures above show, α is well-recovered even when β and δ remain uncertain. If the primary research question concerns sensitivity to expected utility differences rather than the decomposition of expected utility into beliefs and values, the identification challenges for (β, δ) may be of secondary concern.
:::
To understand why β and δ are harder to recover than α, consider the structure of the model. In decisions under uncertainty, choices depend on **expected utilities**:
$$
\eta_r = \sum_{k=1}^K \psi_{rk} \upsilon_k
$$
where $\boldsymbol{\psi}_r$ are subjective probabilities (determined by β) and $\boldsymbol{\upsilon}$ are utilities (determined by δ). The key insight is that we only observe *rankings* of expected utilities through choices—not the utilities themselves, and not the separate contributions of beliefs and values.
This creates a multiplicative coupling between β and δ: different (β, δ) pairs can produce similar expected utility functions, making both parameters harder to pin down precisely:
- **α** directly governs how *sensitively* choices respond to expected utility differences—it is well-identified because choice probabilities depend monotonically on α for any fixed expected utility difference
- **β and δ** jointly determine expected utilities through their product; uncertainty in one propagates to the other
Note that this does not mean β and δ are completely uninformed by the data—the posteriors do concentrate relative to the prior, as the figures above show—but information accumulates more slowly than for α.
### Does Increasing Sample Size Help?
A natural question is whether the weaker δ recovery is simply a matter of insufficient data. (Note: β exhibits similar patterns to δ due to their structural coupling, so we focus on δ as representative of the utility-function parameters.) The study design with M=25 decision problems might not provide enough information. Let's test this by doubling the sample size to M=50.
To ensure a valid comparison, we use the `extend()` method to create the larger design from the original M=25 design. This keeps the same alternatives (features) and original problems, adding 25 new problems that use those same alternatives. Any differences in recovery can then be attributed to sample size rather than different feature configurations.
```{python}
#| label: config-m50
#| echo: true
# Extend the original study design to M=50
# This preserves the same alternatives and original problems
study_m50 = study.extend(additional_M=25, design_name="parameter_recovery_m50")
print(f"Extended Study Design (M=50):")
print(f" Decision problems: {study_m50.M} (original {study.M} + 25 new)")
print(f" Same R={study_m50.R} alternatives with identical features")
print(f" Total expected observations: ~{study_m50.M * 3.5:.0f} choices")
```
```{python}
#| label: run-recovery-m50
#| output: false
# Create output directory
output_dir_m50 = tempfile.mkdtemp(prefix="param_recovery_m50_")
# Run parameter recovery with larger sample
recovery_m50 = ParameterRecovery(
inference_model_path=None,
sim_model_path=None,
study_design=study_m50,
output_dir=output_dir_m50,
n_mcmc_samples=1000,
n_mcmc_chains=4,
n_iterations=50
)
true_params_m50, posterior_summaries_m50 = recovery_m50.run()
```
```{python}
#| label: compare-sample-sizes
#| echo: false
# Compute recovery metrics for M=50
# Alpha
alpha_true_m50 = np.array([p['alpha'] for p in true_params_m50])
alpha_mean_m50 = np.array([s.loc['alpha', 'Mean'] for s in posterior_summaries_m50])
alpha_lower_m50 = np.array([s.loc['alpha', '5%'] for s in posterior_summaries_m50])
alpha_upper_m50 = np.array([s.loc['alpha', '95%'] for s in posterior_summaries_m50])
alpha_rmse_m50 = np.sqrt(np.mean((alpha_mean_m50 - alpha_true_m50)**2))
alpha_coverage_m50 = np.mean((alpha_true_m50 >= alpha_lower_m50) & (alpha_true_m50 <= alpha_upper_m50))
alpha_ci_width_m50 = np.mean(alpha_upper_m50 - alpha_lower_m50)
# Delta (using config from original study)
K_minus_1 = config['K'] - 1
delta_stats_m50 = []
for k in range(K_minus_1):
param_name = f"delta[{k+1}]"
delta_true_m50 = np.array([p['delta'][k] for p in true_params_m50])
delta_mean_m50 = np.array([s.loc[param_name, 'Mean'] for s in posterior_summaries_m50])
delta_lower_m50 = np.array([s.loc[param_name, '5%'] for s in posterior_summaries_m50])
delta_upper_m50 = np.array([s.loc[param_name, '95%'] for s in posterior_summaries_m50])
delta_stats_m50.append({
'parameter': f'δ_{k+1}',
'rmse': np.sqrt(np.mean((delta_mean_m50 - delta_true_m50)**2)),
'coverage': np.mean((delta_true_m50 >= delta_lower_m50) & (delta_true_m50 <= delta_upper_m50)),
'ci_width': np.mean(delta_upper_m50 - delta_lower_m50)
})
delta_df_m50 = pd.DataFrame(delta_stats_m50)
```
```{python}
#| label: fig-sample-size-comparison
#| fig-cap: "Comparison of parameter recovery between M=25 and M=50. Doubling the number of decision problems improves α recovery but has a more modest effect on δ recovery. (β exhibits similar patterns to δ due to their structural coupling.)"
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
# RMSE comparison
ax = axes[0]
params = ['α', 'δ₁', 'δ₂']
rmse_m25 = [alpha_rmse, delta_df.iloc[0]['rmse'], delta_df.iloc[1]['rmse']]
rmse_m50 = [alpha_rmse_m50, delta_df_m50.iloc[0]['rmse'], delta_df_m50.iloc[1]['rmse']]
x = np.arange(len(params))
width = 0.35
bars1 = ax.bar(x - width/2, rmse_m25, width, label='M=25', color='steelblue', alpha=0.7)
bars2 = ax.bar(x + width/2, rmse_m50, width, label='M=50', color='coral', alpha=0.7)
ax.set_xlabel('Parameter', fontsize=12)
ax.set_ylabel('RMSE', fontsize=12)
ax.set_title('RMSE by Sample Size', fontsize=12)
ax.set_xticks(x)
ax.set_xticklabels(params)
ax.legend()
ax.grid(True, alpha=0.3, axis='y')
# Coverage comparison
ax = axes[1]
cov_m25 = [alpha_coverage, delta_df.iloc[0]['coverage'], delta_df.iloc[1]['coverage']]
cov_m50 = [alpha_coverage_m50, delta_df_m50.iloc[0]['coverage'], delta_df_m50.iloc[1]['coverage']]
bars1 = ax.bar(x - width/2, cov_m25, width, label='M=25', color='steelblue', alpha=0.7)
bars2 = ax.bar(x + width/2, cov_m50, width, label='M=50', color='coral', alpha=0.7)
ax.axhline(y=0.90, color='red', linestyle='--', linewidth=2, label='Target (90%)')
ax.set_xlabel('Parameter', fontsize=12)
ax.set_ylabel('Coverage', fontsize=12)
ax.set_title('90% CI Coverage by Sample Size', fontsize=12)
ax.set_xticks(x)
ax.set_xticklabels(params)
ax.set_ylim(0, 1.05)
ax.legend()
ax.grid(True, alpha=0.3, axis='y')
# CI Width comparison
ax = axes[2]
ci_m25 = [alpha_ci_width, delta_df.iloc[0]['ci_width'], delta_df.iloc[1]['ci_width']]
ci_m50 = [alpha_ci_width_m50, delta_df_m50.iloc[0]['ci_width'], delta_df_m50.iloc[1]['ci_width']]
bars1 = ax.bar(x - width/2, ci_m25, width, label='M=25', color='steelblue', alpha=0.7)
bars2 = ax.bar(x + width/2, ci_m50, width, label='M=50', color='coral', alpha=0.7)
ax.set_xlabel('Parameter', fontsize=12)
ax.set_ylabel('CI Width', fontsize=12)
ax.set_title('90% CI Width by Sample Size', fontsize=12)
ax.set_xticks(x)
ax.set_xticklabels(params)
ax.legend()
ax.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()
```
```{python}
#| label: tbl-sample-size-comparison
#| echo: false
# Create comparison table
comparison_data = []
# Alpha
comparison_data.append({
'Parameter': 'α',
'RMSE (M=25)': f'{alpha_rmse:.4f}',
'RMSE (M=50)': f'{alpha_rmse_m50:.4f}',
'Δ RMSE': f'{(alpha_rmse_m50 - alpha_rmse)/alpha_rmse * 100:+.1f}%',
'CI Width (M=25)': f'{alpha_ci_width:.3f}',
'CI Width (M=50)': f'{alpha_ci_width_m50:.3f}'
})
# Delta
for i in range(K_minus_1):
comparison_data.append({
'Parameter': f'δ_{i+1}',
'RMSE (M=25)': f'{delta_df.iloc[i]["rmse"]:.4f}',
'RMSE (M=50)': f'{delta_df_m50.iloc[i]["rmse"]:.4f}',
'Δ RMSE': f'{(delta_df_m50.iloc[i]["rmse"] - delta_df.iloc[i]["rmse"])/delta_df.iloc[i]["rmse"] * 100:+.1f}%',
'CI Width (M=25)': f'{delta_df.iloc[i]["ci_width"]:.3f}',
'CI Width (M=50)': f'{delta_df_m50.iloc[i]["ci_width"]:.3f}'
})
comparison_df = pd.DataFrame(comparison_data)
print("Sample Size Comparison (M=25 vs M=50):")
print(comparison_df.to_string(index=False))
```
### Does Adding New Alternatives Help?
The M=50 extension above adds problems over the **same** alternatives—it doesn't expand the feature space. But we might hypothesize that adding **new** alternatives (with new feature vectors) provides qualitatively different information that could help with δ recovery. (As with the previous section, β exhibits similar patterns to δ.)
To test this, we use `extend_with_alternatives()` to add 15 new alternatives along with 25 new problems that use those alternatives. This parallels how model m_1 (in [Report 5](05_adding_risky_choices.qmd)) adds new risky alternatives along with new risky problems, enabling a fairer comparison.
```{python}
#| label: config-m50-new-alts
#| echo: true
# Extend with both new alternatives AND new problems
# New problems only use the new alternatives (paralleling m_1's risky component)
study_m50_new_alts = study.extend_with_alternatives(
additional_M=25,
additional_R=15, # Same number of new alternatives as m_1 has risky alternatives
design_name="parameter_recovery_m50_new_alts",
new_problems_use_new_alts_only=True
)
print(f"Extended Study Design (M=50 with new alternatives):")
print(f" Decision problems: {study_m50_new_alts.M} (original {study.M} + 25 new)")
print(f" Alternatives: R={study_m50_new_alts.R} (original {study.R} + 15 new)")
print(f" Original problems (1-25) use original alternatives (1-15)")
print(f" New problems (26-50) use new alternatives (16-30)")
```
```{python}
#| label: run-recovery-m50-new-alts
#| output: false
# Create output directory
output_dir_m50_new_alts = tempfile.mkdtemp(prefix="param_recovery_m50_new_alts_")
# Run parameter recovery with new alternatives
recovery_m50_new_alts = ParameterRecovery(
inference_model_path=None,
sim_model_path=None,
study_design=study_m50_new_alts,
output_dir=output_dir_m50_new_alts,
n_mcmc_samples=1000,
n_mcmc_chains=4,
n_iterations=50
)
true_params_m50_new_alts, posterior_summaries_m50_new_alts = recovery_m50_new_alts.run()
```
```{python}
#| label: compare-all-extensions
#| echo: false
# Compute recovery metrics for M=50 with new alternatives
# Alpha
alpha_true_m50_new = np.array([p['alpha'] for p in true_params_m50_new_alts])
alpha_mean_m50_new = np.array([s.loc['alpha', 'Mean'] for s in posterior_summaries_m50_new_alts])
alpha_lower_m50_new = np.array([s.loc['alpha', '5%'] for s in posterior_summaries_m50_new_alts])
alpha_upper_m50_new = np.array([s.loc['alpha', '95%'] for s in posterior_summaries_m50_new_alts])
alpha_rmse_m50_new = np.sqrt(np.mean((alpha_mean_m50_new - alpha_true_m50_new)**2))
alpha_coverage_m50_new = np.mean((alpha_true_m50_new >= alpha_lower_m50_new) & (alpha_true_m50_new <= alpha_upper_m50_new))
alpha_ci_width_m50_new = np.mean(alpha_upper_m50_new - alpha_lower_m50_new)
# Delta
delta_stats_m50_new = []
for k in range(K_minus_1):
param_name = f"delta[{k+1}]"
delta_true_m50_new = np.array([p['delta'][k] for p in true_params_m50_new_alts])
delta_mean_m50_new = np.array([s.loc[param_name, 'Mean'] for s in posterior_summaries_m50_new_alts])
delta_lower_m50_new = np.array([s.loc[param_name, '5%'] for s in posterior_summaries_m50_new_alts])
delta_upper_m50_new = np.array([s.loc[param_name, '95%'] for s in posterior_summaries_m50_new_alts])
delta_stats_m50_new.append({
'parameter': f'δ_{k+1}',
'rmse': np.sqrt(np.mean((delta_mean_m50_new - delta_true_m50_new)**2)),
'coverage': np.mean((delta_true_m50_new >= delta_lower_m50_new) & (delta_true_m50_new <= delta_upper_m50_new)),
'ci_width': np.mean(delta_upper_m50_new - delta_lower_m50_new)
})
delta_df_m50_new = pd.DataFrame(delta_stats_m50_new)
```
```{python}
#| label: fig-three-way-comparison
#| fig-cap: "Three-way comparison of δ recovery: M=25 (base), M=50 (same alternatives), and M=50 (new alternatives). Adding new alternatives may provide different information than simply adding more problems over existing alternatives. (β exhibits similar patterns to δ due to their structural coupling.)"
fig, axes = plt.subplots(1, 3, figsize=(16, 5))
params = ['δ₁', 'δ₂']
x = np.arange(len(params))
width = 0.25
# RMSE comparison
ax = axes[0]
rmse_m25_d = [delta_df.iloc[k]['rmse'] for k in range(K_minus_1)]
rmse_m50_d = [delta_df_m50.iloc[k]['rmse'] for k in range(K_minus_1)]
rmse_m50_new_d = [delta_stats_m50_new[k]['rmse'] for k in range(K_minus_1)]
bars1 = ax.bar(x - width, rmse_m25_d, width, label='M=25 (base)', color='steelblue', alpha=0.7)
bars2 = ax.bar(x, rmse_m50_d, width, label='M=50 (same alts)', color='coral', alpha=0.7)
bars3 = ax.bar(x + width, rmse_m50_new_d, width, label='M=50 (new alts)', color='mediumseagreen', alpha=0.7)
ax.set_xlabel('Parameter', fontsize=12)
ax.set_ylabel('RMSE', fontsize=12)
ax.set_title('δ RMSE by Extension Type', fontsize=12)
ax.set_xticks(x)
ax.set_xticklabels(params)
ax.legend(loc='upper right')
ax.grid(True, alpha=0.3, axis='y')
# Coverage comparison
ax = axes[1]
cov_m25_d = [delta_df.iloc[k]['coverage'] for k in range(K_minus_1)]
cov_m50_d = [delta_df_m50.iloc[k]['coverage'] for k in range(K_minus_1)]
cov_m50_new_d = [delta_stats_m50_new[k]['coverage'] for k in range(K_minus_1)]
bars1 = ax.bar(x - width, cov_m25_d, width, label='M=25 (base)', color='steelblue', alpha=0.7)
bars2 = ax.bar(x, cov_m50_d, width, label='M=50 (same alts)', color='coral', alpha=0.7)
bars3 = ax.bar(x + width, cov_m50_new_d, width, label='M=50 (new alts)', color='mediumseagreen', alpha=0.7)
ax.axhline(y=0.90, color='red', linestyle='--', linewidth=2, label='Target (90%)')
ax.set_xlabel('Parameter', fontsize=12)
ax.set_ylabel('Coverage', fontsize=12)
ax.set_title('δ 90% CI Coverage by Extension Type', fontsize=12)
ax.set_xticks(x)
ax.set_xticklabels(params)
ax.set_ylim(0, 1.05)
ax.legend(loc='lower right')
ax.grid(True, alpha=0.3, axis='y')
# CI Width comparison
ax = axes[2]
ci_m25_d = [delta_df.iloc[k]['ci_width'] for k in range(K_minus_1)]
ci_m50_d = [delta_df_m50.iloc[k]['ci_width'] for k in range(K_minus_1)]
ci_m50_new_d = [delta_stats_m50_new[k]['ci_width'] for k in range(K_minus_1)]
bars1 = ax.bar(x - width, ci_m25_d, width, label='M=25 (base)', color='steelblue', alpha=0.7)
bars2 = ax.bar(x, ci_m50_d, width, label='M=50 (same alts)', color='coral', alpha=0.7)
bars3 = ax.bar(x + width, ci_m50_new_d, width, label='M=50 (new alts)', color='mediumseagreen', alpha=0.7)
ax.set_xlabel('Parameter', fontsize=12)
ax.set_ylabel('CI Width', fontsize=12)
ax.set_title('δ 90% CI Width by Extension Type', fontsize=12)
ax.set_xticks(x)
ax.set_xticklabels(params)
ax.legend(loc='upper right')
ax.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()
```
```{python}
#| label: tbl-three-way-comparison
#| echo: false
# Create three-way comparison table with better formatting
from tabulate import tabulate
three_way_data = []
# Alpha
three_way_data.append([
'α',
f'{alpha_rmse:.4f}',
f'{alpha_rmse_m50:.4f}',
f'{alpha_rmse_m50_new:.4f}',
f'{alpha_ci_width:.3f}',
f'{alpha_ci_width_m50:.3f}',
f'{alpha_ci_width_m50_new:.3f}'
])
# Delta
for i in range(K_minus_1):
three_way_data.append([
f'δ_{i+1}',
f'{delta_df.iloc[i]["rmse"]:.4f}',
f'{delta_df_m50.iloc[i]["rmse"]:.4f}',
f'{delta_stats_m50_new[i]["rmse"]:.4f}',
f'{delta_df.iloc[i]["ci_width"]:.3f}',
f'{delta_df_m50.iloc[i]["ci_width"]:.3f}',
f'{delta_stats_m50_new[i]["ci_width"]:.3f}'
])
headers = ['Param', 'RMSE\nM=25', 'RMSE\nM=50 (same)', 'RMSE\nM=50 (new)',
'CI Width\nM=25', 'CI Width\nM=50 (same)', 'CI Width\nM=50 (new)']
print("Three-Way Extension Comparison:")
print(tabulate(three_way_data, headers=headers, tablefmt='simple', stralign='right'))
```
### Interpretation: Sample Size and Alternative Effects
The three-way comparison reveals important patterns about what helps δ recovery (β exhibits similar patterns due to the structural coupling between these parameters):
::: {.callout-note}
## Extension Type Matters
Comparing the three conditions:
1. **M=25 (base)**: Baseline recovery with modest δ precision
2. **M=50 (same alternatives)**: Adding 25 problems over the same R=15 alternatives
3. **M=50 (new alternatives)**: Adding 25 problems using 15 **new** alternatives (R increases to 30)
Key findings:
- **α recovery** improves with more data regardless of extension type
- **δ recovery** may benefit differently from new alternatives vs. more problems over existing alternatives
- Adding new feature vectors expands the feature space, potentially providing different information about how features map to subjective probabilities
The comparison between conditions 2 and 3 is particularly important for understanding what helps δ recovery. If new alternatives (condition 3) outperform same alternatives (condition 2), it suggests that expanding the feature space provides qualitatively different information.
:::
This three-way comparison sets up a crucial comparison for [Report 5](05_adding_risky_choices.qmd), where we add risky alternatives (lotteries with known probabilities) instead of additional uncertain alternatives. By comparing:
- **M=50 with new uncertain alternatives** (this report): Expands uncertain feature space
- **M=25 + N=25 with new risky alternatives** (Report 5): Adds known-probability lotteries
...we can isolate whether improvements come from the *type* of data (risk vs. uncertainty) or simply from adding more diverse choice problems.
The asymmetry in learning rates for (β, δ) versus α reflects the model structure: in decisions under uncertainty, utilities and subjective probabilities enter the likelihood through their product (expected utilities). This creates a multiplicative coupling that makes (β, δ) harder to pin down than α, which more directly governs choice sensitivity.
### Study Design Parameters in m_0
The `m_0.stan` data block defines several parameters that characterize study size:
| Parameter | Description | Current Value |
|-----------|-------------|---------------|
| M | Number of decision problems | 25 → 50 |
| K | Number of consequences | 3 |
| D | Feature dimensions | 5 |
| R | Distinct alternatives | 15 |
We might consider varying other parameters:
- **Increasing K** (more consequences): Would require estimating more δ parameters, potentially making recovery harder
- **Increasing R** (more alternatives): Provides more information about β, and may indirectly help δ through better-constrained expected utilities
- **Increasing D** (more features): Similar to increasing R—primarily helps with β
The effectiveness of these changes for (β, δ) recovery is an empirical question worth exploring in future work.
### An Alternative Approach: Adding Risky Choice Data
Decision theory offers a principled alternative to simply increasing sample size: adding **risky choice data** where probabilities are known to the decision-maker. This approach, motivated by the Anscombe-Aumann framework, provides a different type of information that may help constrain utility parameters more directly.
In risky decisions, the probabilities are fixed by the experimenter, so choices directly reveal information about utilities without the partial confounding with subjective probabilities that occurs in decisions under uncertainty.
::: {.callout-note}
## Preview of Model m_1
Model m_1 extends m_0 by adding N risky choice problems where:
- Alternatives are lotteries with experimenter-specified probabilities
- The same utility function υ (and hence δ) governs both choice types
- Risky choices provide more direct information about utility structure
This provides an alternative route to improving (β, δ) recovery that may be more efficient than simply increasing M. See [Report 5](05_adding_risky_choices.qmd) for the full development and comparison.
:::
## Conclusion
Parameter recovery analysis for model m_0 reveals differential recovery across parameters:
1. **Well-recovered parameter**: α (sensitivity) can be recovered with good precision
2. **Slower-learning parameters**: β (feature weights) and δ (utility increments) show wider uncertainty, though they do respond to data—posteriors concentrate relative to priors, but more slowly than for α
3. **Sample size helps, but differentially**: Doubling M improves α recovery more than (β, δ) recovery
This pattern reflects the identification structure of the SEU model analyzed formally in [Report 5](05_adding_risky_choices.qmd): while α is globally identified, β and δ interact multiplicatively in the expected utility, causing uncertainty in one to propagate to the other.
**Implications for applications:**
- If the primary research question concerns sensitivity (α)—how strongly choices respond to expected utility differences—model m_0 performs well even with modest sample sizes
- If precise decomposition of expected utility into beliefs (β) and values (δ) is needed, either larger samples or the addition of risky choice data (model m_1) will be required
**Two paths forward** for improving (β, δ) recovery:
1. **More data**: Larger M (and perhaps R) may eventually yield precise (β, δ) estimates, though the rate of improvement is slower than for α
2. **Different data**: Adding risky choices (model m_1) provides a complementary data source where probabilities are known, breaking the multiplicative coupling and constraining utilities more directly
The choice between these approaches—or their combination—depends on practical considerations like study feasibility and the relative importance of precise utility estimation for the research question at hand.